Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Cancers are typically fueled by sequential accumulation of driver mutations in a previously healthy cell. Some of these mutations, such as inactivation of the first copy of a tumor suppressor gene, can be neutral, and some, like those resulting in activation of oncogenes, may provide cells with a selective growth advantage. We study a multi-type branching process that starts with healthy tissue in homeostasis and models accumulation of neutral and advantageous mutations on the way to cancer. We provide results regarding the sizes of premalignant populations and the waiting times to the first cell with a particular combination of mutations, including the waiting time to malignancy. Finally, we apply our results to two specific biological settings: initiation of colorectal cancer and age incidence of chronic myeloid leukemia. Our model allows for any order of neutral and advantageous mutations and can be applied to other evolutionary settings.more » « less
-
Abstract Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.more » « less
-
The questions of how healthy colonic crypts maintain their size, and how homeostasis is disrupted by driver mutations, are central to understanding colorectal tumorigenesis. We propose a three-type stochastic branching process, which accounts for stem, transit-amplifying (TA) and fully differentiated (FD) cells, to model the dynamics of cell populations residing in colonic crypts. Our model is simple in its formulation, allowing us to estimate all but one of the model parameters from the literature. Fitting the single remaining parameter, we find that model results agree well with data from healthy human colonic crypts, capturing the considerable variance in population sizes observed experimentally. Importantly, our model predicts a steady-state population in healthy colonic crypts for relevant parameter values. We show that APC and KRAS mutations, the most significant early alterations leading to colorectal cancer, result in increased steady-state populations in mutated crypts, in agreement with experimental results. Finally, our model predicts a simple condition for unbounded growth of cells in a crypt, corresponding to colorectal malignancy. This is predicted to occur when the division rate of TA cells exceeds their differentiation rate, with implications for therapeutic cancer prevention strategies.more » « less
-
We study a multi-stage model for the development of colorectal cancer from initially healthy tissue. The model incorporates a complex sequence of driver gene alterations, some of which result in immediate growth advantage, while others have initially neutral effects. We derive analytic estimates for the sizes of premalignant subpopulations, and use these results to compute the waiting times to premalignant and malignant genotypes. This work contributes to the quantitative understanding of colorectal tumor evolution and the lifetime risk of colorectal cancer.more » « less
-
Abstract Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein–protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.more » « less
An official website of the United States government
